Cleaning up the driver data into a usable state prior to use ### Merging RB and Racing Bulls
driver_names <- driver_names %>%
mutate(team_name = ifelse(team_name == "RB", "Racing Bulls", team_name))In this section I am cleaning the pitstop_data$pit_duration to remove na values, and outliers. ### Filtering Down to Race Pitstops
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.80 21.80 23.40 104.20 27.15 2485.90
pitstop_data <- pitstop_data[pitstop_data$pit_duration <= 150,]
pitstop_duration_Q1 <- quantile(pitstop_data$pit_duration, 0.25, na.rm = TRUE)
pitstop_duration_Q3 <- quantile(pitstop_data$pit_duration, 0.75, na.rm = TRUE)
pitstop_iqr <- pitstop_duration_Q3 - pitstop_duration_Q1 #I could also just do this with pitstop_iqr <- IQR(pitstop_data$pit_duration, na.rm = TRUE)
pitstop_median <- median(pitstop_data$pit_duration, na.rm = TRUE)
summary(pitstop_data$pit_duration)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.80 21.70 23.30 24.27 25.30 93.20
First off I am creating a demonstration of an interactive set of stacked boxplots. This will not be free of issues yet, but this is the begining of me designing visualizations.
TeamBoxplot <- ggplot(pitstop_data, aes(x = reorder(team_name, pit_duration, median), y = pit_duration, fill = team_colour)) +
geom_boxplot(alpha = 0.6, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.3, color = "black") +
coord_flip() +
labs(title = "Pit Stop Duration by Team", x = "Team", y = "Pit Stop Duration (s)") +
theme_bw()+
theme(legend.position = "none")+
scale_fill_identity()
ggplotly(TeamBoxplot, width = 1200, height = 700) %>%
layout(autosize = TRUE)